The goals for this analysis are to investigate how the gene expression changes based on the mouse model, genotype within the mouse model, and age. Questions to answer: -What genes are deferentially expressed due to Mouse Model Alone? -How does aging affect gene expression within each mouse model? -How does the genotype within each mouse model affect gene expression at different time points?
library(DESeq2)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, Map,
mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit, which.max,
which.min
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: ‘MatrixGenerics’
The following objects are masked from ‘package:matrixStats’:
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs,
colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps,
colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds,
colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates, colVarDiffs,
colVars, colWeightedMads, colWeightedMeans, colWeightedMedians, colWeightedSds,
colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, rowCollapse, rowCounts,
rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs,
rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, rowOrderStats,
rowProds, rowQuantiles, rowRanges, rowRanks, rowSdDiffs, rowSds, rowSums2, rowTabulates,
rowVarDiffs, rowVars, rowWeightedMads, rowWeightedMeans, rowWeightedMedians, rowWeightedSds,
rowWeightedVars
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor,
see 'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:MatrixGenerics’:
rowMedians
The following objects are masked from ‘package:matrixStats’:
anyMissing, rowMedians
Warning message:
R graphics engine version 15 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.
library(pheatmap)
library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:Biobase’:
combine
The following object is masked from ‘package:matrixStats’:
count
The following objects are masked from ‘package:GenomicRanges’:
intersect, setdiff, union
The following object is masked from ‘package:GenomeInfoDb’:
intersect
The following objects are masked from ‘package:IRanges’:
collapse, desc, intersect, setdiff, slice, union
The following objects are masked from ‘package:S4Vectors’:
first, intersect, rename, setdiff, setequal, union
The following objects are masked from ‘package:BiocGenerics’:
combine, intersect, setdiff, union
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(dendextend)
---------------------
Welcome to dendextend version 1.15.2
Type citation('dendextend') for how to cite the package.
Type browseVignettes(package = 'dendextend') for the package vignette.
The github page is: https://github.com/talgalili/dendextend/
Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
You may ask questions at stackoverflow, use the r and dendextend tags:
https://stackoverflow.com/questions/tagged/dendextend
To suppress this message use: suppressPackageStartupMessages(library(dendextend))
---------------------
Attaching package: ‘dendextend’
The following object is masked from ‘package:stats’:
cutree
design_matrix<-read.table('/Users/tasnimtabassum/Documents/Transcriptomics_SP22/Experimental_Design_TG.csv',sep=',',header=TRUE)
head(design_matrix)
rownames(design_matrix)<-design_matrix$Sample
design_matrix$Sample<-NULL
design_matrix
Import matrix containing the counts for all samples. These counts represent the forward strand counts since this was a stranded library.
counts_matrix<-read.table("/Users/tasnimtabassum/Documents/Transcriptomics_SP22/GitHub/Transcriptomics-Final-Project-/Count_Tables/allcounts.csv",sep=',',header=TRUE)
counts_matrix
Because the numbers after the dot in the ensembl IDs represent versions of genes in certain annotations, we can remove these to more easily conduct our differential gene expression analysis.
counts_matrix$V1<-gsub("\\..*","",counts_matrix$V1)
counts_matrix
# remove the "V1" from col 1
rownames(counts_matrix)<-counts_matrix$V1
counts_matrix$V1<-NULL
head(counts_matrix)
We need to sort the counts_matrix as well as our experimental design matrix in order to run DESEQ2
counts_matrix<-counts_matrix[,order(colnames(counts_matrix))]
counts_matrix
design_matrix<-design_matrix[order(rownames(design_matrix)),]
design_matrix
dds <- DESeqDataSetFromMatrix(countData = counts_matrix,
colData = design_matrix,
design = ~ Genotype + Age+ Model)
some variables in design formula are characters, converting to factors the design formula contains one or more numeric variables with integer values,
specifying a model with increasing fold change for higher values.
did you mean for this to be a factor? if so, first convert
this variable to a factor using the factor() function
the design formula contains one or more numeric variables that have mean or
standard deviation larger than 5 (an arbitrary threshold to trigger this message).
Including numeric variables with large mean can induce collinearity with the intercept.
Users should center and scale numeric variables in the design to improve GLM convergence.
dds
class: DESeqDataSet
dim: 46075 72
metadata(1): version
assays(1): counts
rownames(46075): ENSMUSG00000000001 ENSMUSG00000000003 ... N_noFeature N_unmapped
rowData names(0):
colnames(72): SRR8512301 SRR8512302 ... SRR8512439 SRR8512440
colData names(3): Model Genotype Age
Remove genes with counts less than 10
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
Run DESeq
dds <- DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 5166 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
rawcounts.matrix <- counts(dds,normalized=F)
normalizedcounts.matrix <- counts(dds,normalized=T)
vst_dds <- vst(dds)
dists <- dist(t(assay(vst_dds)))
dists
SRR8512301 SRR8512302 SRR8512303 SRR8512304 SRR8512307 SRR8512308 SRR8512309 SRR8512310
SRR8512302 34.98989
SRR8512303 24.75489 33.13232
SRR8512304 29.32833 34.57388 24.73419
SRR8512307 21.09034 36.71252 24.89597 30.84387
SRR8512308 31.23042 19.83868 32.57351 35.74883 31.15181
SRR8512309 25.47078 41.31314 22.72178 29.07433 22.56086 37.35232
SRR8512310 23.98892 29.27166 21.56907 25.83143 22.57367 29.23152 25.59670
SRR8512313 22.22640 34.21704 22.53681 25.42066 21.83970 32.57706 20.81997 16.75686
SRR8512318 24.51173 34.21059 21.70783 30.77728 26.62417 33.64663 28.42689 22.88748
SRR8512319 37.11827 20.57906 34.48485 35.85855 37.39977 20.50508 40.79025 32.15999
SRR8512324 18.20493 35.47741 24.93245 30.50401 19.92692 29.94846 24.55149 24.51515
SRR8512325 31.55144 46.28219 37.96788 40.33317 34.98445 44.23550 36.88122 35.36607
SRR8512326 22.23004 32.71514 26.44609 28.07123 25.84814 30.21772 28.13695 24.39279
SRR8512329 21.08136 33.40568 19.13528 26.81156 20.09536 30.27128 18.33309 20.77168
SRR8512313 SRR8512318 SRR8512319 SRR8512324 SRR8512325 SRR8512326 SRR8512329 SRR8512330
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318 25.25760
SRR8512319 35.60471 38.89835
SRR8512324 22.96872 27.10411 35.95050
SRR8512325 32.69575 39.19733 47.68236 30.97220
SRR8512326 23.49090 30.52538 33.49744 21.40496 33.68735
SRR8512329 20.09278 21.85412 33.85347 21.36197 35.52867 24.62444
SRR8512331 SRR8512332 SRR8512335 SRR8512336 SRR8512341 SRR8512342 SRR8512347 SRR8512348
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
SRR8512349 SRR8512350 SRR8512351 SRR8512352 SRR8512353 SRR8512354 SRR8512363 SRR8512364
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
SRR8512365 SRR8512366 SRR8512371 SRR8512372 SRR8512373 SRR8512374 SRR8512375 SRR8512376
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
SRR8512379 SRR8512380 SRR8512381 SRR8512382 SRR8512384 SRR8512385 SRR8512386 SRR8512387
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
SRR8512389 SRR8512390 SRR8512399 SRR8512400 SRR8512401 SRR8512402 SRR8512405 SRR8512406
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
SRR8512407 SRR8512408 SRR8512411 SRR8512412 SRR8512426 SRR8512427 SRR8512428 SRR8512429
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
SRR8512430 SRR8512431 SRR8512432 SRR8512433 SRR8512434 SRR8512435 SRR8512439
SRR8512302
SRR8512303
SRR8512304
SRR8512307
SRR8512308
SRR8512309
SRR8512310
SRR8512313
SRR8512318
SRR8512319
SRR8512324
SRR8512325
SRR8512326
SRR8512329
[ reached getOption("max.print") -- omitted 57 rows ]
TO DO: *****Label samples
dendogram<-plot(hclust(dists),cex=0.6)
labels(dendogram)
character(0)
Trying to rename labels….
dend <- dists %>% hclust %>% as.dendrogram
dend %>% set("labels", c(rep('rtg4510',26),rep('J20',42),rep('rtg4510',4))) %>% set("labels_cex", 0.5) %>% plot
`
*** Add title, make graph cute (size bigger) Capitalize group
plotPCA(vst_dds,intgroup=c("Model","Genotype","Age"))
res_1 <- results(dds, contrast = c("Genotype","rtg4510","WT_TG"))
res1_ordered <- res_1[order(res_1$pvalue),]
head(res1_ordered,200)
log2 fold change (MLE): Genotype rtg4510 vs WT_TG
Wald test p-value: Genotype rtg4510 vs WT_TG
DataFrame with 200 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000079037 14497.968 1.262646 0.0659797 19.1369 1.24473e-81 2.38740e-77
ENSMUSG00000021171 692.285 -1.037739 0.0642636 -16.1482 1.17004e-58 1.12207e-54
ENSMUSG00000056553 4190.567 -1.151315 0.0796032 -14.4632 2.07048e-47 1.32373e-43
ENSMUSG00000018411 7052.860 0.863156 0.0677374 12.7427 3.42466e-37 1.64213e-33
ENSMUSG00000000805 334.382 1.055046 0.0836956 12.6057 1.96305e-36 7.53026e-33
... ... ... ... ... ... ...
ENSMUSG00000021886 4.84672 3.074444 0.5178245 5.93723 2.89874e-09 2.85117e-07
ENSMUSG00000021032 107.68873 -0.889651 0.1499263 -5.93392 2.95783e-09 2.87998e-07
ENSMUSG00000060176 36.76433 -0.755378 0.1272986 -5.93391 2.95807e-09 2.87998e-07
ENSMUSG00000022358 192.95473 0.589465 0.0998653 5.90260 3.57808e-09 3.46604e-07
ENSMUSG00000037946 104.42285 0.389471 0.0660163 5.89962 3.64344e-09 3.51162e-07
Filter the res1_ordered byp value to obtain only those genes with p value less than 0.05
#Filter Differentially Expressed Genes by p value 0.05
res1_ordered_filtered <- res1_ordered[res1_ordered$pvalue<0.05,]
See how many genes have p value equals to zero
res1_ordered_filtered_2 <- res1_ordered[res1_ordered$pvalue==0,]
Genes with pvalue==0
res1_ordered_filtered_2
log2 fold change (MLE): Genotype rtg4510 vs WT_TG
Wald test p-value: Genotype rtg4510 vs WT_TG
DataFrame with 0 rows and 6 columns
Genes with pvalue<0.05
res1_ordered_filtered
log2 fold change (MLE): Genotype rtg4510 vs WT_TG
Wald test p-value: Genotype rtg4510 vs WT_TG
DataFrame with 5604 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000079037 14497.968 1.262646 0.0659797 19.1369 1.24473e-81 2.38740e-77
ENSMUSG00000021171 692.285 -1.037739 0.0642636 -16.1482 1.17004e-58 1.12207e-54
ENSMUSG00000056553 4190.567 -1.151315 0.0796032 -14.4632 2.07048e-47 1.32373e-43
ENSMUSG00000018411 7052.860 0.863156 0.0677374 12.7427 3.42466e-37 1.64213e-33
ENSMUSG00000000805 334.382 1.055046 0.0836956 12.6057 1.96305e-36 7.53026e-33
... ... ... ... ... ... ...
ENSMUSG00000036208 80.7554 0.1211650 0.0617908 1.96089 0.0498920 0.176325
ENSMUSG00000052926 336.0837 -0.1012596 0.0516416 -1.96081 0.0499006 0.176325
ENSMUSG00000047242 413.5181 0.0903139 0.0460635 1.96064 0.0499211 0.176365
ENSMUSG00000024862 2200.1750 -0.1517860 0.0774291 -1.96032 0.0499580 0.176463
ENSMUSG00000029802 305.8101 0.1549010 0.0790213 1.96024 0.0499673 0.176463
res_2 <- results(dds, contrast = c("Age","2","8"))
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, :
Age is not a factor, see ?results
?results
-How does the genotype within each mouse model affect gene expression at different time points?
#GO-Term Enrichment
Install Mouse annotation library:
BiocManager::install("org.Mm.eg.db")
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details
replacement repositories:
CRAN: https://cran.rstudio.com/
Bioconductor version 3.15 (BiocManager 1.30.17), R 4.2.0 (2022-04-22)
package(s) not installed when version(s) same as current; use `force = TRUE` to re-install: 'org.Mm.eg.db'
install.packages("devtools")
Error in install.packages : Updating loaded packages
devtools::install_github("stephenturner/annotables")
Downloading GitHub repo stephenturner/annotables@HEAD
checking for file ‘/private/var/folders/6y/jp6148xn7bj2wzbf3h_c4xt00000gn/T/RtmpNSybFS/remotes14c43a534409/stephenturner-annotables-631423c/DESCRIPTION’ ...
✔ checking for file ‘/private/var/folders/6y/jp6148xn7bj2wzbf3h_c4xt00000gn/T/RtmpNSybFS/remotes14c43a534409/stephenturner-annotables-631423c/DESCRIPTION’
─ preparing ‘annotables’:
checking DESCRIPTION meta-information ...
✔ checking DESCRIPTION meta-information
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
NB: this package now depends on R (>= 3.5.0)
WARNING: Added dependency on R >= 3.5.0 because serialized objects in
serialize/load version 3 cannot be read in older versions of R.
File(s) containing such objects:
‘annotables/data/bdgp6.rda’ ‘annotables/data/bdgp6_tx2gene.rda’
‘annotables/data/ensembl_version.rda’ ‘annotables/data/galgal5.rda’
‘annotables/data/galgal5_tx2gene.rda’ ‘annotables/data/grch37.rda’
‘annotables/data/grch37_tx2gene.rda’ ‘annotables/data/grch38.rda’
‘annotables/data/grch38_tx2gene.rda’ ‘annotables/data/grcm38.rda’
‘annotables/data/grcm38_tx2gene.rda’ ‘annotables/data/mmul801.rda’
‘annotables/data/mmul801_tx2gene.rda’ ‘annotables/data/rnor6.rda’
‘annotables/data/rnor6_tx2gene.rda’ ‘annotables/data/wbcel235.rda’
‘annotables/data/wbcel235_tx2gene.rda’
─ building ‘annotables_0.1.91.tar.gz’
* installing *source* package ‘annotables’ ...
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (annotables)
install.packages("ggnewscale")
trying URL 'https://cran.rstudio.com/bin/macosx/contrib/4.2/ggnewscale_0.4.7.tgz'
Content type 'application/x-gzip' length 342456 bytes (334 KB)
==================================================
downloaded 334 KB
The downloaded binary packages are in
/var/folders/6y/jp6148xn7bj2wzbf3h_c4xt00000gn/T//RtmpNSybFS/downloaded_packages
library(biomaRt) #For conversion of transcript IDs to gene ID
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
library(annotables) #to retrieve grcm38 annotation for mouse genome
library(org.Mm.eg.db) #Mouse genome annotation
Loading required package: AnnotationDbi
Attaching package: ‘AnnotationDbi’
The following object is masked from ‘package:dplyr’:
select
library(DOSE)
DOSE v3.22.0 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use DOSE in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609
library(pathview)
##############################################################################
Pathview is an open source software package distributed under GNU General
Public License version 3 (GPLv3). Details of GPLv3 is available at
http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
formally cite the original Pathview paper (not just mention it) in publications
or products. For details, do citation("pathview") within R.
The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
license agreement (details at http://www.kegg.jp/kegg/legal.html).
##############################################################################
library(clusterProfiler)
clusterProfiler v4.4.1 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:AnnotationDbi’:
select
The following object is masked from ‘package:biomaRt’:
select
The following object is masked from ‘package:IRanges’:
slice
The following object is masked from ‘package:S4Vectors’:
rename
The following object is masked from ‘package:stats’:
filter
library(AnnotationHub)
Loading required package: BiocFileCache
Loading required package: dbplyr
Attaching package: ‘dbplyr’
The following objects are masked from ‘package:dplyr’:
ident, sql
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Attaching package: ‘AnnotationHub’
The following object is masked from ‘package:Biobase’:
cache
library(ensembldb)
Loading required package: GenomicFeatures
Loading required package: AnnotationFilter
Attaching package: 'ensembldb'
The following object is masked from 'package:clusterProfiler':
filter
The following object is masked from 'package:dplyr':
filter
The following object is masked from 'package:stats':
filter
library(tidyverse)
── Attaching packages ──────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.6 ✔ purrr 0.3.4
✔ tibble 3.1.7 ✔ stringr 1.4.0
✔ tidyr 1.2.0 ✔ forcats 0.5.1
✔ readr 2.1.2
── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::collapse() masks IRanges::collapse()
✖ dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
✖ dplyr::count() masks matrixStats::count()
✖ dplyr::desc() masks IRanges::desc()
✖ tidyr::expand() masks S4Vectors::expand()
✖ ensembldb::filter() masks clusterProfiler::filter(), dplyr::filter(), stats::filter()
✖ dplyr::first() masks S4Vectors::first()
✖ dbplyr::ident() masks dplyr::ident()
✖ dplyr::lag() masks stats::lag()
✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
✖ purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
✖ clusterProfiler::rename() masks dplyr::rename(), S4Vectors::rename()
✖ ensembldb::select() masks clusterProfiler::select(), AnnotationDbi::select(), biomaRt::select(), dplyr::select()
✖ purrr::simplify() masks clusterProfiler::simplify()
✖ clusterProfiler::slice() masks dplyr::slice(), IRanges::slice()
✖ dbplyr::sql() masks dplyr::sql()
library(ggnewscale)
grcm38
res1_ordered
log2 fold change (MLE): Genotype rtg4510 vs WT_TG
Wald test p-value: Genotype rtg4510 vs WT_TG
DataFrame with 25644 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000079037 14497.968 1.262646 0.0659797 19.1369 1.24473e-81 2.38740e-77
ENSMUSG00000021171 692.285 -1.037739 0.0642636 -16.1482 1.17004e-58 1.12207e-54
ENSMUSG00000056553 4190.567 -1.151315 0.0796032 -14.4632 2.07048e-47 1.32373e-43
ENSMUSG00000018411 7052.860 0.863156 0.0677374 12.7427 3.42466e-37 1.64213e-33
ENSMUSG00000000805 334.382 1.055046 0.0836956 12.6057 1.96305e-36 7.53026e-33
... ... ... ... ... ... ...
ENSMUSG00000100419 13.060758 0 0.764635 0 1 1
ENSMUSG00000101153 0.357632 0 1.197851 0 1 NA
ENSMUSG00000103043 0.168415 0 2.436069 0 1 NA
ENSMUSG00000103706 0.655486 0 1.022463 0 1 NA
ENSMUSG00000104658 0.120422 0 2.521910 0 1 NA
idx <- grcm38$ensgene %in% rownames(res1_ordered)
head(idx)
[1] TRUE FALSE TRUE TRUE TRUE TRUE
non_duplicates <- which(duplicated(ids$ensgene) == FALSE)
ids <- ids[non_duplicates, ]
res_tableOE_tb <- res1_ordered_filtered %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble()
library("dplyr")
library("annotables")
res_ids = inner_join(res_tableOE_tb, ids, by=c("gene"="ensgene"))
res_ids
sigOE_genes <- as.character(sigOE$ensgene)
Unknown or uninitialised column: `ensgene`.
## Run GO enrichment analysis
ego <- enrichGO(gene = sigOE_genes,
universe =ids$ensgene ,
keyType = "ENSEMBL",
OrgDb = org.Mm.eg.db,
ont = "BP",
pAdjustMethod = "BH",
qvalueCutoff = 0.05,
readable = TRUE,
pool=TRUE)
## Output results from GO analysis to a table
cluster_summary <- data.frame(ego)
dotplot(ego, showCategory=50)
xyz = pairwise_termsim(ego)
Error in pairwise_termsim(ego) :
could not find function "pairwise_termsim"
Because we have transcript IDs, we want ensembl ids in order to be able to conduct GO Term enrichment we need ensembl ID
In order to conduct the hypergeometric test on each set of differentially expressed genes we need to have two sets of genes: a background set, and a significant differentially expressed gene set. The background set will be comprised of all differentially expressed genes and the genes of interest will be those with significant p values (0.05).
We are using ‘ALL’ for ontology because we want to see all of the differentially expressed genes accross all categories. We are using the bonferroni correction.
## Use mouse genome
ego <- enrichGO(gene = rownames(res1_ordered_filtered),
universe =ids$ensgene ,
keyType = "ENSEMBL",
OrgDb = org.Mm.eg.db,
ont = "ALL",
pAdjustMethod = "BH",
qvalueCutoff = 0.05,
readable = TRUE,
pool=TRUE)
## Output results from GO analysis to a table
enriched_genes_res1 <- data.frame(ego)
dotplot(ego, showCategory=50)
OE_foldchanges <- res1_ordered_filtered$log2FoldChange
OE_foldchanges<-head(OE_foldchanges,150)
names(OE_foldchanges) <- rownames(head(res1_ordered_filtered,150))
cnetplot(ego,
categorySize="pvalue",
showCategory = 5,
foldChange=OE_foldchanges,
vertex.label.font=2)
Scale for 'size' is already present. Adding another scale for 'size', which will replace the existing scale.
emapplot(ego, showCategory = 50)
Error in has_pairsim(x) :
Term similarity matrix not available. Please use pairwise_termsim function to deal with the results of enrichment analysis.